{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "acf78056",
   "metadata": {},
   "source": [
    "# Long Context Reorder\n",
    "\n",
    "- Author: [Minji](https://github.com/r14minji)\n",
    "- Peer Review: \n",
    "- Proofread : [jishin86](https://github.com/jishin86)\n",
    "- This is a part of [LangChain OpenTutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)\n",
    "\n",
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/10-Retriever/04-LongContextReorder.ipynb)[![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/10-Retriever/04-LongContextReorder.ipynb)\n",
    "\n",
    "## Overview\n",
    "\n",
    "Regardless of the model's architecture, performance significantly degrades when including more than 10 retrieved documents.\n",
    "\n",
    "Simply put, when the model needs to access relevant information in the middle of a long context, it tends to ignore the provided documents.\n",
    "\n",
    "For more details, please refer to the following paper:\n",
    "\n",
    "- https://arxiv.org/abs/2307.03172\n",
    "\n",
    "To avoid this issue, you can prevent performance degradation by reordering documents after retrieval.\n",
    "\n",
    "Create a retriever that can store and search text data using the Chroma vector store.\n",
    "Use the retriever's invoke method to search for highly relevant documents for a given query.\n",
    "\n",
    "\n",
    "### Table of Contents\n",
    "\n",
    "- [Overview](#overview)\n",
    "- [Environment Setup](#environment-setup)\n",
    "- [Create an instance of the LongContextReorder class named reordering](#create-an-instance-of-the-longcontextreorder-class-named-reordering)\n",
    "- [Creating Question-Answering Chain with Context Reordering](#creating-question-answering-chain-with-context-reordering)\n",
    "\n",
    "---\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f97b9226",
   "metadata": {},
   "source": [
    "## Environment Setup\n",
    "\n",
    "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n",
    "\n",
    "**[Note]**\n",
    "- ```langchain-opentutorial``` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
    "- You can checkout the [```langchain-opentutorial```](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55947067",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture --no-stderr\n",
    "!pip install langchain-opentutorial"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "d28e6a9e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Configuration file for managing API keys as environment variables\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "# Load API key information\n",
    "load_dotenv(override=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "889a6197",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "from langchain_opentutorial import package\n",
    "\n",
    "package.install(\n",
    "    [\n",
    "       \"langsmith\",\n",
    "        \"langchain\",\n",
    "        \"langchain_openai\",\n",
    "        \"langchain_community\",\n",
    "        \"langchain-chroma\",\n",
    "    ],\n",
    "    verbose=False,\n",
    "    upgrade=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "bc1ebfea",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Environment variables have been set successfully.\n"
     ]
    }
   ],
   "source": [
    "from langchain_opentutorial import set_env\n",
    "\n",
    "set_env(\n",
    "    {\n",
    "        # \"OPENAI_API_KEY\": \"\",\n",
    "        # \"LANGCHAIN_API_KEY\": \"\",\n",
    "        \"LANGCHAIN_TRACING_V2\": \"true\",\n",
    "        \"LANGCHAIN_ENDPOINT\": \"https://api.smith.langchain.com\",\n",
    "        \"LANGCHAIN_PROJECT\": \"04-LongContextReorder\",\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c8271a18",
   "metadata": {},
   "source": [
    "## Create an instance of the LongContextReorder class named reordering."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f461786",
   "metadata": {},
   "source": [
    "Enter a query for the retriever to perform the search."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5bb794e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.prompts import PromptTemplate\n",
    "from langchain_community.document_transformers import LongContextReorder\n",
    "from langchain_community.vectorstores import Chroma\n",
    "from langchain_openai import OpenAIEmbeddings\n",
    "\n",
    "# Get embeddings\n",
    "embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\n",
    "\n",
    "texts = [\n",
    "    \"This is just a random text I wrote.\",\n",
    "    \"ChatGPT, an AI designed to converse with users, can answer various questions.\",\n",
    "    \"iPhone, iPad, MacBook are representative products released by Apple.\",\n",
    "    \"ChatGPT was developed by OpenAI and is continuously being improved.\",\n",
    "    \"ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers.\",\n",
    "    \"Wearable devices like Apple Watch and AirPods are also part of Apple's popular product line.\",\n",
    "    \"ChatGPT can be used to solve complex problems or suggest creative ideas.\",\n",
    "    \"Bitcoin is also called digital gold and is gaining popularity as a store of value.\",\n",
    "    \"ChatGPT's capabilities are continuously evolving through ongoing learning and updates.\",\n",
    "    \"The FIFA World Cup is held every four years and is the biggest event in international football.\",\n",
    "]\n",
    "\n",
    "\n",
    "\n",
    "# Create a retriever (Set K to 10)\n",
    "retriever = Chroma.from_texts(texts, embedding=embeddings).as_retriever(\n",
    "    search_kwargs={\"k\": 10}\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "a42887d5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),\n",
       " Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),\n",
       " Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),\n",
       " Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),\n",
       " Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),\n",
       " Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),\n",
       " Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),\n",
       " Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),\n",
       " Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),\n",
       " Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.')]"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"What can you tell me about ChatGPT?\"\n",
    "\n",
    "# Retrieves relevant documents sorted by relevance score.\n",
    "docs = retriever.invoke(query)\n",
    "docs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "041fe044",
   "metadata": {},
   "source": [
    "Create an instance of LongContextReorder class.\n",
    "\n",
    "- Call reordering.transform_documents(docs) to reorder the document list.\n",
    "- Less relevant documents are positioned in the middle of the list, while more relevant documents are positioned at the beginning and end.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "f1067a69",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),\n",
       " Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),\n",
       " Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),\n",
       " Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),\n",
       " Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),\n",
       " Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),\n",
       " Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),\n",
       " Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),\n",
       " Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),\n",
       " Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.')]"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Reorder the documents\n",
    "# Less relevant documents are positioned in the middle, more relevant elements at start/end\n",
    "reordering = LongContextReorder()\n",
    "reordered_docs = reordering.transform_documents(docs)\n",
    "\n",
    "# Verify that 4 relevant documents are positioned at start and end\n",
    "reordered_docs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d84086f",
   "metadata": {},
   "source": [
    "## Creating Question-Answering Chain with Context Reordering\n",
    "\n",
    "A chain that enhances QA (Question-Answering) performance by reordering documents using LongContextReorder, which optimizes the arrangement of context for better comprehension and response accuracy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "da423187",
   "metadata": {},
   "outputs": [],
   "source": [
    "def format_docs(docs):\n",
    "    return \"\\n\".join([doc.page_content for i, doc in enumerate(docs)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "696b01bb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ChatGPT was developed by OpenAI and is continuously being improved.\n",
      "ChatGPT was developed by OpenAI and is continuously being improved.\n",
      "ChatGPT was developed by OpenAI and is continuously being improved.\n",
      "ChatGPT was developed by OpenAI and is continuously being improved.\n",
      "ChatGPT was developed by OpenAI and is continuously being improved.\n",
      "ChatGPT, an AI designed to converse with users, can answer various questions.\n",
      "ChatGPT, an AI designed to converse with users, can answer various questions.\n",
      "ChatGPT, an AI designed to converse with users, can answer various questions.\n",
      "ChatGPT, an AI designed to converse with users, can answer various questions.\n",
      "ChatGPT, an AI designed to converse with users, can answer various questions.\n"
     ]
    }
   ],
   "source": [
    "print(format_docs(docs))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "228d3fcf",
   "metadata": {},
   "outputs": [],
   "source": [
    "def format_docs(docs):\n",
    "    return \"\\n\".join(\n",
    "        [\n",
    "            f\"[{i}] {doc.page_content} [source: teddylee777@gmail.com]\"\n",
    "            for i, doc in enumerate(docs)\n",
    "        ]\n",
    "    )\n",
    "\n",
    "\n",
    "def reorder_documents(docs):\n",
    "    # Reorder\n",
    "    reordering = LongContextReorder()\n",
    "    reordered_docs = reordering.transform_documents(docs)\n",
    "    combined = format_docs(reordered_docs)\n",
    "    print(combined)\n",
    "    return combined"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1af40e60",
   "metadata": {},
   "source": [
    "Prints the reordered documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "37cd3eb0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]\n",
      "[1] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]\n",
      "[2] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]\n",
      "[3] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]\n",
      "[4] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]\n",
      "[5] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]\n",
      "[6] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]\n",
      "[7] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]\n",
      "[8] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]\n",
      "[9] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]\n"
     ]
    }
   ],
   "source": [
    "# Define prompt template\n",
    "_ = reorder_documents(docs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "ca7bc3fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts import ChatPromptTemplate\n",
    "from operator import itemgetter\n",
    "from langchain_openai import ChatOpenAI\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.runnables import RunnableLambda\n",
    "\n",
    "# Define prompt template\n",
    "template = \"\"\"Given this text extracts:\n",
    "{context}\n",
    "\n",
    "-----\n",
    "Please answer the following question:\n",
    "{question}\n",
    "\n",
    "Answer in the following languages: {language}\n",
    "\"\"\"\n",
    "\n",
    "# Define prompt\n",
    "prompt = ChatPromptTemplate.from_template(template)\n",
    "\n",
    "# Define Chain\n",
    "chain = (\n",
    "    {\n",
    "        \"context\": itemgetter(\"question\")\n",
    "        | retriever\n",
    "        | RunnableLambda(reorder_documents),  # Search context based on question\n",
    "        \"question\": itemgetter(\"question\"),  # Extract question\n",
    "        \"language\": itemgetter(\"language\"),  # Extract answer language\n",
    "    }\n",
    "    | prompt  # Pass values to prompt template\n",
    "    | ChatOpenAI(model=\"gpt-4o-mini\")  # Pass prompt to language model\n",
    "    | StrOutputParser()  # Parse model output as string\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9686eb7e",
   "metadata": {},
   "source": [
    "\n",
    "Enter the query in question and language for response.\n",
    "\n",
    "Check the search results of reordered documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "3a34707a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]\n",
      "[1] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]\n",
      "[2] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]\n",
      "[3] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]\n",
      "[4] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]\n",
      "[5] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]\n",
      "[6] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]\n",
      "[7] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]\n",
      "[8] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]\n",
      "[9] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]\n"
     ]
    }
   ],
   "source": [
    "answer = chain.invoke(\n",
    "    {\"question\": \"What can you tell me about ChatGPT?\", \"language\": \"English\"}\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f58119d0",
   "metadata": {},
   "source": [
    "Prints the response."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "fc86e520",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ChatGPT is an AI language model developed by OpenAI. Its capabilities are continuously evolving through ongoing learning and updates, which means it is regularly improved to enhance its performance and functionality. The model is designed to understand and generate human-like text, making it useful for a variety of applications such as conversational agents, content creation, and more.\n"
     ]
    }
   ],
   "source": [
    "print(answer)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "langchain-opentutorial-LLYV5Gls-py3.11",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
